Skip to content

[rust/rqd] Fix process cache and timeout feature#2160

Open
aoblet wants to merge 2 commits intoAcademySoftwareFoundation:masterfrom
aoblet:fix/rust-rqd
Open

[rust/rqd] Fix process cache and timeout feature#2160
aoblet wants to merge 2 commits intoAcademySoftwareFoundation:masterfrom
aoblet:fix/rust-rqd

Conversation

@aoblet
Copy link
Contributor

@aoblet aoblet commented Jan 26, 2026

Link the Issue(s) this Pull Request is related to.
#2159
#2158

Summarize your change.
Update rust rqd to:

  • Be resilient with various Linux OS versions
  • Fix timeout feature

Expected format is in microsecond, rust is generating
timestamp in second.

We conform before sending to the dispatcher
@DiegoTavares
Copy link
Collaborator

This should fix the pipeline: fe1b1e2
It should be merged later today and reintegrated into your branch.

layer_id: self.request.layer_id.clone(),
num_cores: self.request.num_cores,
start_time: start_time as i64,
start_time: (start_time * 1000) as i64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share where you're seeing a problem with this value? Looking at our environment, startime seems to be reported correctly, but I might be looking at a different view.

Copy link
Contributor Author

@aoblet aoblet Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/AcademySoftwareFoundation/OpenCue/blob/master/cuebot/src/main/java/com/imageworks/spcue/dispatcher/HostReportHandler.java#L839-L843

frame.getStartTime() returns in seconds instead of microseconds.
Then the condition is always true as long as a timeout is set on the layer.

Comment on lines 528 to 531
let start = stat.find('(').unwrap();
let end = stat.find(')').unwrap();
let fields: Vec<&str> = stat[end+2..].split_whitespace().collect();
let session_id: u32 = fields[3].parse().unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth documenting with a comment what's the expected format.

Based on rqmachine.py on the old rqd, there is some variation on the /stat format that we might have to account for:

        Stats file can star with these formats:
         - 105 name ...
         - 105 (name) ...
         - 105 (name with space) ...
         - 105 (name with) (space and parenthesis) ...

For the first example, with no parenthesis, the code would panic on your unwrap() call.

I think it's worth creating an isolated function to part /stat to allow adding some unit tests to the bottom of this files to ensure every variation is covered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 103274b.

  • Keep NSsid logic and fallback on stat file fetching
  • Better rust idiom logic
  • Refacto to a specific method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DiegoTavares what do you think ?

Depending on multiple criterias proc/status might not
contain NSSid field, they can be:
  - OS and version
  - Kernel version
  - SELinux state
  - App Armor
  - OS

So instead of parsing proc/status file we revert back to
python rqd implementation using proc/stat.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants